-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make it possible to allow discovery errors for controllers #49495
make it possible to allow discovery errors for controllers #49495
Conversation
d6a4e3a
to
c9423e3
Compare
/retest |
@@ -366,7 +367,10 @@ func GetAvailableResources(clientBuilder controller.ControllerClientBuilder) (ma | |||
|
|||
resourceMap, err := discoveryClient.ServerResources() | |||
if err != nil { | |||
return nil, fmt.Errorf("failed to get supported resources from server: %v", err) | |||
utilruntime.HandleError(fmt.Errorf("unable to get all supported resources from server: %v", err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HandleError only logs here? Does this fix the issue where if I have two versions of a TPR, the cm crash loops?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HandleError only logs here?
The choice is up to the admin I suppose, there's an env var to force it either way. Returning an error unconditionally fails, so this at least gives them the choice.
Does this fix the issue where if I have two versions of a TPR, the cm crash loops?
I'm not familiar with this. Got an issue handy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means that we continue, go into the if clause below and return an error. What have we won with this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means that we continue, go into the if clause below and return an error. What have we won with this change?
it can return an error and a the results it was able to get. Consider the aggregated case with one server of ten being down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TPR issue @mikedanese mentioned is #22768 (comment). If user created multiple versions of a TPR, only the first version is operational, so the discovery here fails and causes controller manager to crashloop.
return nil, fmt.Errorf("failed to get supported resources from server: %v", err) | ||
utilruntime.HandleError(fmt.Errorf("unable to get all supported resources from server: %v", err)) | ||
} | ||
if resourceMap == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strange interface which returns a nil map without an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at least compare the length to be a bit more complete. Also below...
@@ -187,7 +187,7 @@ func (d *DiscoveryClient) ServerResourcesForGroupVersion(groupVersion string) (r | |||
} | |||
|
|||
// serverResources returns the supported resources for all groups and versions. | |||
func (d *DiscoveryClient) serverResources(failEarly bool) ([]*metav1.APIResourceList, error) { | |||
func (d *DiscoveryClient) serverResources(_ bool) ([]*metav1.APIResourceList, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's get rid of the failEarly
argument.
c9423e3
to
b7286f3
Compare
comments addressed. |
/retest |
/release-note-none |
Can we cherrypick it to 1.7? Without it, controller manager will crashloop if user registers multiple versions for a TPR (#22768 (comment)). @deads2k @wojtek-t |
/retest |
1 similar comment
/retest |
/lgtm |
/approve no-issue |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: caesarxuchao, deads2k, sttts Associated issue requirement bypassed by: sttts The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
/retest |
I'm torn. Are the failures being tracked? /retest |
Only in aggregate. We're trying to bring better visibility. |
/retest |
Automatic merge from submit-queue (batch tested with PRs 49665, 49689, 49495, 49146, 48934) |
Cherrypick approved - automated cherrypick in #49767 |
Commit found in the "release-1.7" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked. |
The federated unit tests, the way are done have been udpdated in master and are no longer the preferred way of doing this test. They however are still invoked on 1.7. This is to enable the cherry pick of kubernetes#49495 on 1.7.
Update the discovery client to return partial discovery information and an error. Since we can aggregate API servers, discovery of some resources can fail independently. Callers of this function who want to tolerate the errors can, existing callers will still get an error and fail in normal blocks.
@kubernetes/sig-api-machinery-misc @sttts